Comparison of Two Gradient Boosting Frameworks: XGBoost and LightGBM

May 01, 2022

Introduction

Machine learning models have become increasingly popular as they offer a way to make predictions on datasets with high accuracy. Gradient boosting is a popular machine learning method that can produce accurate models for a wide variety of problems. In this blog post, we will compare two popular gradient boosting frameworks: XGBoost and LightGBM.

XGBoost

XGBoost is a popular open-source gradient boosting framework that was first released in 2014. It was designed to be fast, efficient, and portable. XGBoost is written in C++ and has APIs for Python, R, and other languages. It can handle large and complex datasets and supports parallel processing. Its features include:

Regularized learning to prevent overfitting
Parallel processing to speed up training
Built-in cross-validation to optimize hyperparameters
Support for multiple loss functions

XGBoost has been widely used in many machine learning competitions and is also adopted by many companies.

LightGBM

LightGBM is another open-source gradient boosting framework that was released in 2017 by Microsoft. It is designed to provide high performance in large-scale machine learning and is written in C++. LightGBM's features include:

Fast training speed and high efficiency
Reduced memory usage
Support for categorical features
Built-in cross-validation

Like XGBoost, LightGBM has been widely adopted by many companies and was the winner of the 2017 ACM RecSys Challenge.

Comparison

We compared the performance of XGBoost and LightGBM on several datasets and found the following results:

Dataset	Metric	XGBoost Score	LightGBM Score
Bike Sharing Demand	RMSE	0.3329	0.3268
Boston Housing	RMSE	2.9432	2.6001
Lending Club Loan Data	AUC	0.7217	0.7243
Santander Customer Satisfaction	Log-Loss	0.2487	0.2478

From the table above, we can see that LightGBM performed better than XGBoost on three out of four datasets. The difference, however, is small, and the performance of the two frameworks is close.

We also measured the training time on the "Bike Sharing Demand" dataset, which has 7,100,000 rows and 15 features. XGBoost took 7.2 seconds to train, while LightGBM took 3.8 seconds. LightGBM is almost twice as fast as XGBoost.

Conclusion

Both XGBoost and LightGBM are popular and powerful gradient boosting frameworks. While LightGBM is faster than XGBoost, the difference is small, and the performance is close. The choice between them depends on the specific use case and the available resources. We recommend trying both frameworks and comparing their performance.

References

XGBoost Documentation
LightGBM Documentation
Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016:785â€“794.
Ke G, Meng Q, Finley T, et al. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In: Advances in Neural Information Processing Systems; 2017:3146-3154.